Final_Project_Ds2020
Global Air Travel Analysis using OpenFlights
Lucas Martins Sorge, Nina De Grandis, Brandon Merrick
Introduction
This data science project explores global air travel patterns using
datasets from the OpenFlights database. We analyze airline networks,
airport connectivity, geographic coverage, and operational
characteristics to reveal insights into global aviation trends. Our goal
during this project is to find when and where air traffic is more
concentrated. We also want to analyze the structure of global airline
route networks, examine geographic coverage and identify underserved
regions, and study operational characteristics, including fleet usage
and route lengths.
Questions: - How concentrated is global air traffic? - How does
airport connectivity vary between developed and developing countries? -
What country has the most airports? Where are the countries with the
most airports located, and what patterns are there? - Brandon’s -
Brandon’s
Data Sources
Data was obtained from OpenFlights: -
airlines.dat: Airlines data including operational status. -
airports.dat: Airport location and operational details. -
routes.dat: Flight routes between airports. -
planes.dat: Aircraft types and equipment information. -
countries.dat: Country codes and geographic metadata.
Project Objectives
- Analyze the structure of global airline route networks.
- Examine geographic coverage and identify underserved regions.
- Study operational characteristics, including fleet usage and route
lengths.
Completed Steps
- Data loading and cleaning:
- Handling missing and invalid data.
- Filtering for active airlines and valid airports.
- Joining datasets (routes, airlines, airports).
Methodology
- Our analysis was conducted entirely in R,
leveraging a combination of data wrangling, statistical modeling, and
visualization techniques.
Results
Question 1: How concentrated is global air
traffic?
- Extreme airport-level inequality
- Lorenz Curve bows sharply below the line of perfect equality,
indicating most flights funnel through a few major hubs.
- Gini coefficient (airports): 0.78
knitr::include_graphics("figures/lorenz_airport.png")

- Top-percentile shares
- Top 1% of airports handle ~20% of
all flights
- Top 5% handle ~53% of all
flights
- Top 10% handle ~70% of all
flights
- Leading hubs
- The busiest airports—Atlanta (ATL), Chicago
O’Hare (ORD), Beijing Capital (PEK),
etc.—together account for a disproportionately large share of global
traffic.
- Route-level distribution
- Lorenz Curve for routes lies closer to the equality line, showing a
more even spread across connections.
- Gini coefficient (routes): 0.31
knitr::include_graphics("figures/lorenz_route.png")

- Key routes
- Top connections (e.g., ORD → ATL, JFK →
LHR) are busiest but represent a smaller overall share compared
to top airports.
- Conclusion
- Global aviation has a dual structure: a small
number of dominant hubs manage the bulk of air traffic, while a wide
range of routes ensures broad global connectivity and operational
resilience.
Question 2: How does airport connectivity vary
between developed and developing countries?
- Stronger connectivity in developed countries
- Airports in developed countries show significantly higher average
connectivity than those in developing countries.
- Average connectivity (number of outgoing routes per
airport) is roughly twice as high in developed
nations.
- Statistical evidence
- Welch Two Sample t-test: p = 0.0079
- Mann-Whitney U test: p = 0.0228
- Both confirm that the difference is statistically
significant.
- Top connected countries
- Most countries with the highest average connectivity (e.g.,
United States, Germany,
France) are developed.
- Some developing countries like the
UAE and Singapore stand out as
exceptions due to geographic or economic advantages.
- Visual insights
- A global map of airports shows large hubs (colored by development
status) are clustered in North America, Europe, and East
Asia.
- Airports in developing countries tend to be more regionally
focused with lower integration into global flight
networks.
knitr::include_graphics("figures/global_connectivity_map.png")

- Conclusion
- Global airport connectivity reflects broader economic
inequalities.
- Developed countries are far more integrated into the air
transportation network, both in infrastructure and route diversity.
Question 3: What country has the most airports?
Where are the countries with the most airports located, and what
patterns are there?
Question 4:
Question 5:
Conclusion